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ABSTRACT 

Eukaryotic genomes are replicated from multiple 
DNA replication origins. We present complementary 
deep sequencing approaches to measure origin 
location and activity in Saccharomyces cerevisiae. 
Measuring the increase in DNA copy number 
during a synchronous S-phase allowed the precise 
determination of genome replication. To map origin 
locations, replication forks were stalled close to 
their initiation sites; therefore, copy number enrich- 
ment was limited to origins. Replication timing 
profiles were generated from asynchronous 
cultures using fluorescence-activated cell sorting. 
Applying this technique we show that the replication 
profiles of haploid and diploid cells are indistin- 
guishable, indicating that both cell types use the 
same cohort of origins with the same activities. 
Finally, increasing sequencing depth allowed the 
direct measure of replication dynamics from an ex- 
ponentially growing culture. This is the first time this 
approach, called marker frequency analysis, has 
been successfully applied to a eukaryote. These 
data provide a high-resolution resource and meth- 
odological framework for studying genome biology. 

INTRODUCTION 

Complete replication of the genome is a requirement 
for successful cell division and therefore is fundamental 
to all life. The efficient replication of eukaryotic 
genomes requires coordinated initiation from thousands 
of DNA replication origins (1,2). Such coordinated 



origin activation leads to stereotypic replication-timing 
programs in which replication timing correlates with chro- 
matin structure, gene expression, genome evolution and 
other critical aspects of genome metabohsm. A major 
challenge in replication biology is the mapping of replica- 
tion timing programs and the study of how they are 
regulated (3). 

Individual origins vary in their time of activation during 
S-phase, creating a distinct spatial and temporal pattern of 
genome replication. Differences in origin activation time 
may reflect differences in affinity for a limited number of 
activating molecules, modulated by subnuclear position- 
ing and chromatin state (4). In yeast, several mutations 
have been characterized that disrupt the temporal order of 
genome replication. Some of these mutants globally alter 
replication timing, either by changes to chromatin modi- 
fications or subnuclear positioning (5,6). Other mutations 
only alter the replication time of specific chromosomal 
domains. To date, mutants have been isolated that 
change the replication time of telomeres from late to 
early S-phase (7,8) and different mutants that delay the 
normally early replication time of centromeres (9). 

Characterizing genome replication requires methodo- 
logies to locate replication origins and determine their 
activation time (10). Origins have been localized 
genome-wide by a range of approaches, including chroma- 
tin immunoprecipitation (ChlP) of ORG, MCM or incor- 
porated BrdU, origin 'bubble-trap' assays, sequencing of 
Okazaki fragments, plasmid-based assays, measuring 
abundance of small nascent strands and replication in 
hydroxyurea (HU) (11-16). The temporal order of 
genome replication has been measured during a synchron- 
ous S-phase by tracking the progress of replication fork 
proteins, the incorporation of labeled nucleotides or direct 
detection of the products of DNA replication (17-19). 
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Alternatively, cell sorting allows enrichment of replicating 
(and nonreplicating) cell populations without the require- 
ment for cell cycle synchronization (20). Together these 
methods have allowed the measurement of genome reph- 
cation dynamics in a range of mutants and various species, 
including Saccharomyces cerevisiae, Schizosaccharomyces 
pombe and cell lines from Drosophila melanogaster, Mas 
musculus. Homo sapiens and Arabidopsis thaliana (3). 
Although a variety of approaches have been used to 
map replication origins and measure replication kinetics, 
these techniques are often used in an ad hoc manner, and 
there have been few direct comparisons between them. 

Here we present a direct comparison of different deep 
sequencing-based approaches for studying genome repli- 
cation. First, we identified origin locations and activities 
using a checkpoint mutant subjected to the replication 
inhibitor HU. Second, a highly synchronous cell popula- 
tion was used to measure precisely wild-type replication 
dynamics. Third, replication dynamics in an asynchronous 
population were determined by sorting replicating cells. 
Finally, we show that replication dynamics can be 
directly measured from an exponentially growing ceU 
population by direct sequencing of the genomic DNA. 
The use of a single strain background allows direct com- 
parison between these approaches and provides a meth- 
odological and data resource for future investigation of 
genome replication. 

MATERIALS AND METHODS 

Yeast strains and methods 

All strains used were from the W303 background and are 
hsted in Supplementary Table S2. CeUs were grown in 
standard rich YPD medium. For cell cycle synchroniza- 
tion, alpha factor was added to a final concentration of 
200 nM; release was via resuspension in media containing 
0.2mg/ml pronase. Flow cytometry samples were fixed in 
70% ethanol, washed with 50 mM sodium citrate, 
sonicated and treated with RNase A and proteinase K 
before staining with Ix SYTOX® green nucleic acid 
stain (Invitrogen). To kill culture samples for deep 
sequencing, sodium azide (final concentration 0.1%) and 
EDTA (20 mM) were added. Samples for marker fre- 
quency analysis (MFA) were grown at 30°C and harvested 
from exponential (ODgoo of 0.7) and stationary phase 
(ODgoo >4.0). RNaseA and proteinase K were used at 
final concentrations of 0.2 and 0.5mg/ml, respectively, 
throughout. All DNA samples for deep sequencing were 
resuspended in TE (10 mM Tris, pH8, 1 mM EDTA). 

HU experiment 

CeUs were grown, arrested and released at 30° C into 
200 mM HU. Cell pellets were resuspended in 5 ml cold 
freshly prepared NIB buffer (17% glycerol, 50 mM 
MOPS, 150mM potassium acetate, 2mM magnesium 
chloride, 500 mM spermidine, 150mM spermine). After 
addition of a similar volume of glass beads, samples 
were vortexed vigorously for 30 s, followed by 30 s 
coohng in an ice-water bath. The vortex-coohng cycle 
was repeated until cell breakage was >95%. The extract 



was recovered from the glass beads and gently resus- 
pended in 5 ml G2 buffer (QIAGEN). The sample was 
treated with RNase A and proteinase K followed by cen- 
trifugation. The supernatant was supplemented with 5 ml 
QBT buffer (QIAGEN) and then purified using an 
equilibrated QIAGEN Genomic-Tip 100/G column ac- 
cording to manufacturers' instructions. 

Time course experiment 

Cells were grown, arrested and released at 23°C and 
samples were collected every 2.5 min for flow cytometry 
analysis and 5 min for isolation of genomic DNA. Samples 
for deep sequencing were resuspended in 1.6 ml of lysis 
buffer (10 mM Tris, pH8, 1 mM EDTA, 100 mM sodium 
chloride, 1% sodium dodecyl sulphate (SDS), 2% Triton 
X-100) to which 1.6 ml of glass beads, 0.8 ml of phenol and 
0.8 ml chloroform were added. The sample was vortexed 
for 2 min, then the aqueous phase was recovered and 
treated with proteinase K and RNase A. The DNA was 
recovered by ethanol precipitation. 

Sort-seq 

Cells were grown at 30°C to an ODgoo of 0.65-0.85. Cells 
were pelleted, washed twice with water and fixed in 70% 
ethanol. Fixed cells were pelleted, washed twice and resus- 
pended in 50 mM sodium citrate, sonicated and treated 
with RNase A and proteinase K. Cells were pelleted and 
resuspended in 50 mM sodium citrate containing lOx 
SYTOX® green nucleic acid stain (Invitrogen). At least 
30 million cells were sorted from a particular cell cycle 
stage using a MoFlo Sorter (Coulter Beckman). The fluor- 
escence-activated cell sorting (FACS) machine was set up 
according to the manufacturer's instructions. An argon 
laser (488 nm) was used to excite the SYTOX® green 
stained cells. Data acquired in the FLl channel was 
gated to remove background noise, cell debris and 
doublets. The FLl histogram plot was used to set the 
gates to trigger the sorting. These were optimized for 
the yeast strains and adjusted manually throughout the 
sorting process as required. G2 phase ceUs were selected 
as the nonreplicating control owing to their greater abun- 
dance compared with Gl-phase cells. The purity of the 
sorted cell fractions was confirmed by flow cytometry. 
Sorted cells were spheroplasted with Zymolyase (final con- 
centration of 1 nig/ml) and then treated with SDS, pro- 
teinase K and RNase A. DNA was purified by phenol 
chloroform extraction followed by ethanol precipitation. 

ChlP-seq 

Chip was performed against FLAG-tagged Mcm4 using 
an anti-FLAG monoclonal antibody as described previ- 
ously (21,22). 

Replication profiles 

To generate rephcation timing profiles, the ratio of 
uniquely mapped reads in the replicating samples to the 
nonreplicating samples was calculated. Custom Perl 
scripts (available upon request) were used to independ- 
ently calculate this ratio for every 1 kb window. 
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Windows where fewer than a quarter of the expected 
number (based on total read number and the genome 
size) of reads were mapped in either sample were 
excluded. Differences in absolute read number were cor- 
rected for to give an average copy number value across the 
genome of one. 

For the assessment of precision, the number of reads 
obtained from the 90min sample were compared with 
the Gl sample. For the HU, sort-seq and MFA data 
were empirically normalized to give a minimum value of 
1. For the time course experiment, each time point was 
normalized to the bulk DNA content as measured by flow 
cytometry. In each case, it was confirmed that the 
maximum DNA content did not exceed two genome 
copies. Comparisons between data sets were performed 
using the R environment, and quoted correlations are 
Pearson's coefficients. 

To calculate Trep profiles from replication time course 
data, a curve F[t] = c + [(d - c)/(l + Exp[b(t - e)])"f| was 
fitted for every 1 kb window. The time at which the 
DNA at a given chromosomal coordinate has been 
replicated in half the cells is Trep = e + ln((— 2(c — d)/ 
(3_2c))'/f-l)/b. 

Smoothing, peak calling and motif identification 

Where appropriate, replicating timing data were 
smoothed using a Fourier transformation method. The 
original signal was Fourier-transformed, the highest-fre- 
quency components were removed and the inverse Fourier 
transform was applied to obtain the smoothed signal. To 
avoid artifacts, smoothing was not apphed to regions of 
low data density or the end 5 kb of each chromosome. 
Peak calling within genome-wide data was performed 
using the 'peakdet' script for MATLAB (from 'www. 
billauer.co.il/peakdet. html'). To identify the enrichment 
of a single sequence motif, we used the GIbbsMarkov 
with Significance ANalysis (GIMSAN) algorithm (23). 
Motif searches were confined to windows of 40, 500 and 
1000 bp centered on confirmed ARS Consensus Sequence 
(ACS) (Supplementary Table S3), Mcm4 ChlP-seq peaks 
and HU peaks, respectively. 

RESULTS 

Using deep sequencing to measure genome replication 
dynamics 

To determine the dynamics of genome replication in 
5. cerevisiae, deep sequencing was used to measure 
DNA copy number in replicating cells relative to 
nonreplicating cells (Figure lA-D). Four different 
approaches were compared. First, checkpoint-deficient 
S. cerevisiae cells were synchronized in Gl -phase (alpha 
factor arrest) and released into S-phase in the presence of 
the ribonucleotide reductase inhibitor HU. Second, for 
maximum temporal and spatial resolution cells were syn- 
chronously released from Gl -phase and samples were 
taken during and after DNA replication. Third, 
replicating (S-phase) and nonreplicating (G2 phase) cells 
were selected by FACS. This 'sort-seq' approach does not 
require cell synchronization and therefore allows the 



analysis of both haploid and diploid strains. Fourth, 
DNA from exponential and stationary phase cultures 
was directly sequenced to detect the enrichment in copy 
number at early-replicating relative to later-replicating se- 
quences. The same genetic background (W303) was used 
throughout to allow direct comparisons between 
experiments. 

To assess the precision with which deep sequencing can 
measure changes in relative DNA copy number, two 
nonreplicating samples from our time course experiment 
were compared. In these samples, relative copy number 
could be measured with a coefficient of variation of 
<4% at a spatial resolution of 1 kb. Increasing the 
number of reads mapping per window, either by 
increasing the window size or by increasing the total 
number of sequencing reads, further reduced this coeffi- 
cient of variation (Figure IE and F). The coefficient of 
variation determined from the experimental data was 
close to the theoretical minimum for a random distribu- 
tion of reads to windows (compare data points with the 
hne in Figure IE and F). This demonstrates that deep 
sequencing offers a precise measure of relative DNA 
copy number and therefore has the potential to measure 
replication dynamics at high-resolution genome-wide. 

Genome-wide localization of replication origins 

Subjecting checkpoint-deficient cells to HU limits replica- 
tion to regions proximal to origins, aUowing the identifi- 
cation of initiation sites genome-wide (10). Haploid 
checkpoint-deficient (Arad53 Asmll) cells were arrested 
in Gl -phase and synchronously released into 200 mM 
HU. Samples were taken before release and 60 min after 
release into HU. Resulting DNA samples were subjected 
to deep sequencing. Ratios of uniquely mapped 
sequencing reads between the released and arrested 
sample were determined and normalized to a baseline of 
1.0 (Figure 2A and Supplementary Figure SI). Genomic 
locations with a ratio of one had not been replicated in the 
released sample and are therefore present at the same copy 
number as in the arrested sample. Replication results in an 
increased copy number relative to Gl -phase arrested cells 
and thus in a ratio above 1. The normalized ratios of the 
sequencing reads are a direct measure of the copy number 
and hence the amount of DNA synthesis that occurred 
during the 60 min in HU. 

Genome-wide peak calhng was performed to identify 
regions of DNA synthesis and thus potential replication 
origin locations. A total of 280 copy number peaks were 
detected with a median distance of 476 bp from previously 
defined origins (24). The HU peak locations allowed the 
identification of a sequence logo that strongly resembles 
ACS motifs determined from sites confirmed by mutagen- 
esis or from sites we identified by Mcm4 ChlP-seq 
(Figure 2B, Supplementary Figure S2 and Supplementary 
Table S3). Therefore, copy number maxima coincide with 
locations of known replication origins at a resolution suf- 
ficient to rediscover the known origin sequence motif. 

Peak heights represent the proportion of ceUs that 
replicated the sequence during the 60 min in HU. If a 
sequence was replicated in every cell of the population, 
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Figure 1. Deep sequencing measurements of DNA copy number to determine genome replication dynamics. (A) An overview of two strategies used to 
investigate genome replication. An example of S. cerevisiae chromosome 6 replication progression (top; replication origins marked by vertical ticks, 
centromere by box). (Left) Time points analyzed (bold); (right) schematic of flow cytometry profile indicating samples selected for typical sort-seq 
experiment. (B) Schematic of chromosome replication in checkpoint mutant subjected to replication stress (HU). Solid circles indicate early activating 
origins, dashed circles later activating origins. (C) MFA allows the measurement of copy number differences from exponential phase cultures (~25% 
undergoing DNA replication) relative to stationary phase cultures (no replicating cells). (D) Overview of experimental strategy and data analyses. 
Relationship between experimental noise (coefficient of variation = standard deviation/mean) and various window sizes (E) and read depths (F). In each 
case, the points represent experimental values calculated from the sequencing data and the lines represent theoretical minimum values. 
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Figure 2. DNA copy number changes in a checkpoint mutant subjected to HU. (A) DNA copy number increase is restricted to origin proximal 
regions. The plots show the ratio between uniquely mapped sequencing reads from a Gl- and S-phase sample, normalized to a baseline of 1. Regions 
lacking data are nonunique sequences where reads were not mapped. Black dots are the raw data points in 1 kb windows. Fourier transformation was 
applied to generate the smoothed profiles, shown in solid gray. Open circles represent known replication origin locations; arrows represent peak calls. 
The full genome is shown in Supplementary Figure SI. (B) HU peak calls allow rediscovery of the ORC-binding motif, the ACS. Sequence motifs 
were identified using GIMSAN from experimentally confirmed ACS (top), Mcm4 ChlP-seq peaks (middle) and HU peaks (bottom). The n refers to 
the number of motif occurrences identified (where appropriate relative to the number of peak calls). 



the copy number would reach a theoretical maximum of 2. 
A range of peak heights was observed with a maximum of 
~1.8. Differences in the peak height associated with an 
origin could reflect differences in origin activity. For 
example, origins might not be hcensed in every cell and 
therefore they would not be competent to activate in some 
cells (25). In addition, origins that activate later will ex- 
perience lower dNTP concentrations, which may result in 
origin unwinding but no nascent strand synthesis, giving 
extended regions of single-stranded DNA (26). Therefore, 
later-activating origins would be anticipated to give rise to 
lower peaks as a consequence of reduced DNA synthesis. 
Comparison between HU peak height and the median 
replication time of an origin (see below) revealed a clear 
negative correlation (Supplementary Figure S3). 
Therefore, peak heights in HU data are influenced by 
origin activation time and provide a proxy for activation 
time. 

High temporal and spatial resolution replication dynamics 

For maximum temporal and spatial resolution, we 
analyzed genome replication in a synchronous population 
of 5. cerevisiae. Haploid cells were synchronously released 
into S-phase from a Gl arrest and time points were taken 
throughout and after S-phase. Bulk genome replication 
was assessed by flow cytometry (Figure 3A) and used to 
determine the fraction of the genome replicated at each 
time point (Figure 3B). For each of the time points, the 
DNA copy number was calculated relative to Gl -phase 
and normalized for read number and the fraction of the 
genome replicated (Figure 3C) (18). 

Peaks in each time point represent genomic locations 
that are more replicated than flanking sequences — i.e. rep- 
lication origins (Figure 3D and Supplementary Figure S4). 
Early activating replication origins give rise to clear peaks 



in the first two time points (at 25 and 30min, for example, 
ARS202 and ARS211, Figure 3D), whereas later 
activating origins give rise to clear peaks in later time 
points (for example, ARS206 and ARS218, Figure 3D). 
As S-phase progresses, forks move further from their ini- 
tiation sites and collide with forks from other origins. 
Such regions are represented by valleys in the profiles. 
The flatness of the 90-min time point illustrates that rep- 
lication is complete and every part of the genome is 
equally represented at a copy number of two. 

The time at which each location has been replicated in 
half the cells (Trep) can be inferred from the time course 
data (Figure 3D and Supplementary Figure S5). The Trep 
profiles correlate with and offer increased resolution over 
previously pubhshed replication profiles (Supplementary 
Figure S6). Sharp peaks in replication profiles are a con- 
sequence of defined replication origins (25). Previously we 
have shown that smoothing replication timing data shifts 
peak positions and obscures their sharpness (25). The high 
spatial resolution of our data negates the need for 
smoothing. Therefore the sharp peak signatures of 
discrete origins can be resolved at individual time points 
and in the Trep curves (Figure 3D and E; Supplementary 
Figure S5). The Trep profiles also show smooth valleys, 
consistent with broad zones of replication termination due 
to stochastic origin activation (25,27). 

Characterizing the activity of closely spaced origins 
from genome-wide replication profiles can be difficult 
since the peaks may not be distinguishable. The high 
spatial resolution of our data enables us to resolve 
multiple peaks at clusters of origins. For example, we 
can distinguish two peaks that arise from the activity of 
a pair of origins 9 kb apart (ARS220: 623 kb and ARS221: 
632 kb) (Figure 3D) that previous studies could not 
(18,19,28). Finally, we called replication origin locations 
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Figure 3. Time course analysis of genome replication. (A) Flow cytometry of a culture synchronously released from alpha factor arrest. (B) 
Quantification (with curve fit) of flow cytometry data for subsequent use in data normalization. (C) Data normalization steps. (D) S. cerevisiae 
chromosome 2 replication dynamics from seven time points through a synchronous S-phase (each normalized to the alpha factor arrest): 25 (light 
blue), 30 (red), 35 (blue), 40 (gold), 45 (green), 50 (purple) and 90min (gray). Open circles represent known replication origin locations with six 
replication origins (as described in the text) labeled. The full genome is shown in Supplementary Figure S4. (E) Median rephcation time (Trep), 
calculated from the time course data. The full genome is shown in Supplementary Figure S5. 



from our Trep data and compared them with confirmed 
origin locations (24). Peaks in om- unsmoothed Trep data 
have a median distance of 1.2 kb from confirmed origin 
locations. In contrast origin locations called from previous 
Trep data (19) had a median distance of 3.4 kb 
(Supplementary Figure S7). 

Analysis of genome replication in highly synchronous 
cultures by deep sequencing provides the highest temporal 
and spatial resolution currently available. Here we used 
mating pheromone (alpha factor) to achieve cell cycle syn- 
chronization, but alternatives include the use of condi- 
tional mutants (19), centrifugal elutriation of small 
daughter cells (29) or release from arrest induced by 
drug/nutrient deprivation that blocks a specific cell cycle 
stage (30). Synchronization may not be possible or 



desirable in all circumstances and therefore we investi- 
gated alternative approaches. 

Sort-Seq: replication dynamics from sorted asynchronous 
cell populations 

It is possible that synchronizing cells with alpha factor 
alters the temporal dynamics of genome replication. To 
test this possibility we measured genome replication 
dynamics in an unperturbed cell population. FACS was 
used to enrich G2 phase (nonreplicating) and S-phase 
(replicating) cells from asynchronous cultures of haploid 
and diploid cells (Figure 4A). In S-phase, the copy number 
of a sequence is proportional to how early it replicates, 
whereas in G2 phase, all genomic locations are present at 
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Figure 4. Sort-seq analysis reveals identical replication dynamics in haploids and diploids. (A) Flow cytometry of an asynchronous diploid 
5. cerevisiae culture (gray fill) and resulting S-phase enriched cells (gray line) and G2 enriched cells (black line). (B) Comparison of sort-seq 
from diploid 5. cerevisiae for two biological replicates (repeat 1 used SOLiD; repeat 2 used Illumina sequencing). (C) Scatterplot of median 
replication time (Trep) versus relative copy number from sort-seq for haploid S. cerevisiae. (D) Scatterplot of relative copy number from sort-seq 
for haploid versus diploid S.cerevisiae. (E) Relative copy number replication profiles from sort-seq for S.cerevisiae chromosome 4. Dots are raw data 
points in 1 kb windows, and lines show smoothed profiles (black for diploid, gray for haploid). Bars above the profile indicate 1 kb windows that are 
significantly different (black for /'<0.001; gray for P<0.01). The full genome is shown in Supplementary Figure S9. 



equivalent copy number. The relative copy number of 
each genomic location was determined by deep sequencing 
and normalized to a baseline of 1.0, which resulted in 
observed maximum peak heights of ~2.0. Comparison 
of biological replicates demonstrated the reproducibility 
of the method (Figure 4B and Supplementary Figure S8). 

Plots of relative copy number along every chromosome 
are comparable with replication timing profiles. 
Mathematically, we have previously shown that mean 
copy number has a negative linear relationship with 
median replication time, Trep (31). Therefore, the data 
from our haploid sort-seq experiment were compared 
with the data from the time course experiment 
(Figure 4C). As anticipated, later median replication 
times corresponded to lower mean copy number values 
with a clear negative hnear relationship (correlation coef- 
ficient of —0.90; Supplementary Figure S6). Minor devi- 
ations from the linear relationship are apparent for the 
latest replicating sequences (Trep > ~45 min) consistent 
with reduced discrimination of late replication time in 
the sort-seq sample. This could result from underrepresen- 
tation of the late S-phase cells in the sorted replicating 
sample or minor contamination of the nonreplicating 
sample with late S-phase cells (Figure 4A). Comparison 
of genome-wide rephcation profiles from the sort-seq with 
median replication time from the synchronous time course 
experiment did not reveal any locations or origins with 
altered replication times. Therefore, we find no evidence 
that cell cycle synchronization with alpha factor alters 
rephcation dynamics. 

We compared the dynamics of genome replication 
for the diploid and haploid strains and observed a 
strong correlation (Figure 4D; correlation coefficient 



0.95). Genome-wide the relative replication times are vir- 
tually superimposable (Figure 4E and Supplementary 
Figure S9). To identify regions with altered replication 
dynamics, the statistical significance of differences in rep- 
lication time was calculated (P > 0.99) for each pair of 
data points. In theory, if a replication origin altered in 
activity (in a particular experimental comparison), it 
would be anticipated that a large proportion of the loci 
replicated by that origin (i.e. the replicon) would have a 
significant change in rephcation time. In contrast, if there 
were no significant changes in replication origin activity a 
more random distribution of loci with altered replication 
time would be expected, arising from experimental error. 
Random sources of experimental error should manifest 
randomly throughout the genome and differences in rep- 
lication time should be both positive and negative. By 
contrast, systematic errors (for example, due to subtle 
differences in the fraction of S-phase cells collected) 
could give rise to values that are consistently higher in 
one sample than the other and a nonrandom genomic dis- 
tribution. We observe that the diploid sample has a slight 
bias to higher values than the haploid sample, and these 
loci predominantly correspond to the earliest replicating 
sequences (data not shown). This is consistent with minor 
differences in the fractions sorted being responsible for the 
observed differences. We conclude that genome replica- 
tion dynamics in haploid and diploid strains are 
indistinguishable. 

The future: sequencing from exponentially growing cells 

The methods described above use various approaches to 
enrich for cells that are undergoing genome replication. 
These approaches can be time-consuming and have the 
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potential to introduce biases in the observed genome rep- 
lication dynamics. By contrast, in some prokaryotic 
species, a high proportion of cells within an exponentially 
growing population are undergoing genome replication. 
Consequently MFA can be used to infer replication 
dynamics. For example, we have used deep sequencing 
to characterize genome replication directly from exponen- 
tially growing cells in Escherichia coli (32) and Haloferax 
volcanii (Hawkins et ai, manuscript under review). In 
S. cerevisiae, ~25% cells are undergoing genome replica- 
tion in an exponentially growing culture (fraction of 
S-phase cells from flow cytometry data). Therefore, it is 
anticipated that in exponential phase, early replicating se- 
quences will be ~25% more abundant than late replicating 
sequences. The resolution offered by deep sequencing 
should allow sufficiently low levels of noise (see above; 
Figure IF) to measure differences in copy number 
within the range of 1-1.25. 

Haploid cells were harvested in mid-log phase 
(replicating) and stationary phase (nonreplicating), and 
DNA copy number was measured by deep sequencing. 
Owing to the anticipated experimental noise, a smoothing 
algorithm was applied to the data. The maximum 
smoothed copy number enrichment at early versus late 
replicating sequences was ~1.2, although this value will 
be an underestimate of the proportion of replicating 
cells as a consequence of smoothing (25). The resulting 
smoothed replication profile correlates with those 
obtained by sort-seq and the median replication time 
(Figure 5 and Supplementary Figure SIO). The MFA 
data has higher noise than the other approaches; 
however, it clearly demonstrates the power of deep 
sequencing to measure replication dynamics in an unper- 
turbed experiment. 



DISCUSSION 

Here we present a systematic comparison of deep 
sequencing approaches to measure the dynamics of 
genome replication in a eukaryotic model organism 
(Figure 6). These methods measure the changes in DNA 
copy number resulting from DNA synthesis. These data 
provide a quantitative measure of the fraction of cells that 



have replicated a particular sequence in each sample. This 
approach is in contrast to a number of alternatives (e.g. 
BrdU Chip or accumulation of single-stranded DNA) 
where the signal has no direct quantitative meaning, 
making comparisons between samples difficult. However, 
the challenge for DNA copy number-based approaches is 
the low signal-to-noise ratio, with a maximum difference 
of 2-fold between early- and late-replicating sequences, 
and with a significantly reduced range when using MFA. 
Here we demonstrate that deep sequencing can provide 
high temporal and spatial resolution replication profiles. 

Application of these methods to other organisms will 
need consideration of the genome size, nature of genome 
replication and availability of cell cycle synchronization 
techniques. Obtaining >1000 reads per kb allowed us to 
achieve ~5% standard deviation at 1 kb resolution. 
Therefore, large mammalian genomes will require propor- 
tionally more reads. For example, the sort-seq method- 
ology has already been applied to cultured human cells 
with generation of ~100 reads per kb (33). Larger 
genomes are generally replicated over a longer period 
(i.e require longer S-phases), which, coupled with the ob- 
servation that in some organisms individual origins 
activate in only a small fraction of cell cycles, presents 
additional challenges. Surmounting these challenges may 
require the sorting of multiple S-phase fractions or the use 
of DNA replication inhibitors. 

We used HU to limit replication and allow the precise 
detection of replication initiation sites (Figure 6A and 
Supplementary Figure SI). The accuracy of the identified 
origin locations was sufficient to rediscover the ORC- 
binding motif. The highly quantitative measure of the 
degree and extent of DNA synthesis from each origin 
aUows not only direct comparisons between origins, but 
will also permit direct comparisons between strains and 
conditions. Furthermore, origin peak heights in HU 
provide an indirect measure of origin replication time. 

The time course data from a synchronous cell popula- 
tion gave replication dynamics with the highest spatial and 
temporal resolution (Figure 6B and C), but required cell 
cycle synchronization and the sequencing of several 
samples per experiment. The sort-seq approach offers a 
compromise between resolution, ease of use and the 
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Figure 5. Direct measurement of genome replication in exponentially growing cells. (A) MFA as a proxy for replication time. Gray dots are the raw 
data points (in 1 kb windows), and the line shows the smoothed profile (left j-axis scale). Black dots show relative copy number from the (lUumina) 
sort-seq experiment (right i-axis scale). Open circles and vertical bars represent known replication origin locations. The full genome is shown in 
Supplementary Figure SIO. (B) Comparison of MFA and median replication time (Trep). 
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Figure 6. Juxtaposition of chromosome 7 data from the four copy number approaches to investigating genome replication. (A) DNA replication in a 
checkpoint mutant subjected to HU. (B) A time course through a synchronous S-phase (key as in legend to Figure 3). (C) Median replication time 
(Trep) as calculated from the time course data. (D) Sort-seq to directly compare the replication time of a haploid (gray) and diploid (black). (E) 
MFA as a proxy for replication time. Open circles represent known replication origin locations; blue bars represent the location of the centromere. 



number of samples to be sequenced; furthermore, there is 
no requirement for strain manipulation or cell cycle per- 
turbation. Although there is no direct time measurement, 
the median rephcation time of each sequence can be 
inferred based on the length of S-phase, which in turn 
can be approximated from the proportion of S-phase 
cells in the population and the doubhng time (20). 
Therefore, care should be taken when making direct com- 
parisons between sort-seq data from strains or organisms 



with differences in the duration of S-phase. We have suc- 
cessfully applied sort-seq to a range of yeast species and 
strains (9,34). Here we use this approach to demonstrate 
for (as far as we are aware) the first time that genome 
replication in haploids and diploids is indistinguishable 
(Figure 6D). 

Finally, we present the first whole-genome application 
of MFA in a eukaryote (Figure 6E). The removal of pro- 
cessing steps (e.g. FACS) avoids the minor biases that they 
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can introduce (Supplementary Figure S6). As with sort- 
seq, the MFA offers no direct measure of time, but the 
resulting data correlate with replication time (Supplemen- 
tary Figure S6). The low signal (relative to the current 
levels of experimental noise) in MFA will require a signifi- 
cant increase in the number of sequence reads if it is to 
provide the resolution currently available with sort-seq. 
This will particularly be the case in organisms where 
there are less dramatic differences in mean replication 
time across the genome, for example, for genomes 
replicated from many low-efficiency origins. Therefore, 
with current sequencing costs, we prefer approaches that 
enrich for cells undergoing genome replication. However, 
we foresee that direct sequencing of DNA samples from 
exponentially growing cells might become the most appro- 
priate approach as sequencing costs decrease and read 
numbers increase. For example, for many experimental 
systems, highly synchronized populations cannot be 
achieved, and in some cases, sorting might not be technic- 
ally possible or desirable (e.g. when there are difficulties in 
defining the S-phase ceUs). The use of exponentially 
growing cells involves the least effort, minimizes perturb- 
ation and can be apphed to any cultivable organism. Thus, 
it may be that one of the earliest methods for analyzing 
chromosome repHcation (35) will in the future become the 
approach of choice owing to the dramatic developments in 
DNA sequencing. 



Data access 

Mcm4 ChlP-seq sequence data has been deposited at the 
NCBI Short Read Archive under accession number 
SRX341889. The rephcation timing profiles and deep 
sequencing data have been submitted to the NCBI Gene 
Expression Omnibus (GEO) (http://www.ncbi.nlm.nih. 
gov/geo/) under accession numbers GSE42243 and 
GSE48212. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Onhne, 
including [36-54]. 
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